high-dimensional logistic regression
Variational empirical Bayes variable selection in high-dimensional logistic regression
Logistic regression involving high-dimensional covariates is a practically important problem. Often the goal is variable selection, i.e., determining which few of the many covariates are associated with the binary response. Unfortunately, the usual Bayesian computations can be quite challenging and expensive. Here we start with a recently proposed empirical Bayes solution, with strong theoretical convergence properties, and develop a novel and computationally efficient variational approximation thereof. One such novelty is that we develop this approximation directly for the marginal distribution on the model space, rather than on the regression coefficients themselves. We demonstrate the method's strong performance in simulations, and prove that our variational approximation inherits the strong selection consistency property satisfied by the posterior distribution that it is approximating.
- Asia > Middle East > Jordan (0.04)
- Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.04)
- North America > United States > North Carolina (0.04)
- Research Report > New Finding (0.72)
- Research Report > Experimental Study (0.72)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.88)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)
Reviews: The Impact of Regularization on High-dimensional Logistic Regression
Originality: This paper develops asymptotics theory for high-dimensional regularized logistic regression (LR). The main result of the paper (Theorem 1) is proved for any locally-Lipschitz function \Psi which then in special cases provides asymptotics for common descriptive statistics like correlation, variance, mean-squared error. Special case results for L1 and L2 regularized LR are also derived and quantities highlighted in 1 above are derived. The paper also demonstrates that the numerical simulation results align with the theoretical relations. Quality: The paper contains high quality results and proofs, the notation and setup is well defined in section 2 before the main results.
- Research Report > New Finding (0.63)
- Research Report > Experimental Study (0.63)
Reviews: The Impact of Regularization on High-dimensional Logistic Regression
The authors study the limiting distribution of certain functionals of the penalized maximum likelihood estimator in regression. The paper contains nontrivial new extensions of the work of Sur and Candes in the unpenalized case, and is well-written and interesting. The reviews were mostly positive and the paper is in good shape.
- Research Report > New Finding (0.40)
- Research Report > Experimental Study (0.40)
SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression
Logistic regression remains one of the most widely used tools in applied statistics, machine learning and data science. However, in moderately high-dimensional problems, where the number of features d is a non-negligible fraction of the sample size n, the logistic regression maximum likelihood estimator (MLE), and statistical procedures based the large-sample approximation of its distribution, behave poorly. Recently, Sur and Candès (2019) showed that these issues can be corrected by applying a new approximation of the MLE's sampling distribution in this high-dimensional regime. Unfortunately, these corrections are difficult to implement in practice, because they require an estimate of the \emph{signal strength}, which is a function of the underlying parameters \beta of the logistic regression. To address this issue, we propose SLOE, a fast and straightforward approach to estimate the signal strength in logistic regression.
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
The Impact of Regularization on High-dimensional Logistic Regression
Logistic regression is commonly used for modeling dichotomous outcomes. In the classical setting, where the number of observations is much larger than the number of parameters, properties of the maximum likelihood estimator in logistic regression are well understood. Recently, Sur and Candes \cite{sur2018modern} have studied logistic regression in the high-dimensional regime, where the number of observations and parameters are comparable, and show, among other things, that the maximum likelihood estimator is biased. In the high-dimensional regime the underlying parameter vector is often structured (sparse, block-sparse, finite-alphabet, etc.) and so in this paper we study regularized logistic regression (RLR), where a convex regularizer that encourages the desired structure is added to the negative of the log-likelihood function. An advantage of RLR is that it allows parameter recovery even for instances where the (unconstrained) maximum likelihood estimate does not exist.
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
The Impact of Regularization on High-dimensional Logistic Regression
Salehi, Fariborz, Abbasi, Ehsan, Hassibi, Babak
Logistic regression is commonly used for modeling dichotomous outcomes. In the classical setting, where the number of observations is much larger than the number of parameters, properties of the maximum likelihood estimator in logistic regression are well understood. Recently, Sur and Candes \cite{sur2018modern} have studied logistic regression in the high-dimensional regime, where the number of observations and parameters are comparable, and show, among other things, that the maximum likelihood estimator is biased. In the high-dimensional regime the underlying parameter vector is often structured (sparse, block-sparse, finite-alphabet, etc.) and so in this paper we study regularized logistic regression (RLR), where a convex regularizer that encourages the desired structure is added to the negative of the log-likelihood function. An advantage of RLR is that it allows parameter recovery even for instances where the (unconstrained) maximum likelihood estimate does not exist.
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
The phase transition for the existence of the maximum likelihood estimate in high-dimensional logistic regression
Candes, Emmanuel J., Sur, Pragya
This paper rigorously establishes that the existence of the maximum likelihood estimate (MLE) in high-dimensional logistic regression models with Gaussian covariates undergoes a sharp `phase transition'. We introduce an explicit boundary curve $h_{\text{MLE}}$, parameterized by two scalars measuring the overall magnitude of the unknown sequence of regression coefficients, with the following property: in the limit of large sample sizes $n$ and number of features $p$ proportioned in such a way that $p/n \rightarrow \kappa$, we show that if the problem is sufficiently high dimensional in the sense that $\kappa > h_{\text{MLE}}$, then the MLE does not exist with probability one. Conversely, if $\kappa < h_{\text{MLE}}$, the MLE asymptotically exists with probability one.
- North America > United States > California > Santa Clara County > Stanford (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Research Report > New Finding (0.49)
- Research Report > Experimental Study (0.35)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.71)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.71)